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ABSTRACT 

We  estimate  a  vehicle’s  speed,  width,  and  length  by  jointly  es¬ 
timating  its  acoustic  wave-pattern  using  a  single  passive  acous¬ 
tic  sensor  that  records  the  vehicle’s  drive-by  noise.  The  acoustic 
wave-pattern  is  estimated  using  three  envelope  shape  (ES)  com¬ 
ponents,  which  approximate  the  shape  of  the  received  signal’s 
power  envelope.  We  incorporate  the  parameters  of  the  ES  compo¬ 
nents  along  with  estimates  of  the  vehicle  engine  RPM  and  number 
of  cylinders  to  form  a  vehicle  profile  vector.  This  vector  provides 
a  compressed  statistics  that  can  be  used  for  vehicle  identification 
and  classification.  Vehicle  speed  estimation  and  classification  re¬ 
sults  are  provided  using  field  data. 

1.  INTRODUCTION 

Estimation  of  vehicle  motion  parameters  using  signals  received 
at  passive  sensors  is  a  classical  signal  processing  problem  [1-6]. 
When  a  single  passive  acoustic  sensor  is  used,  wave  propagation 
effects  are  used  to  determine  the  source  movements  based  on  the 
following  assumptions  that  the  vehicle  A.)  is  a  point  source  [1,2], 
B)  has  stationary  signal  characteristics  that  admit  a  model  such 
as  an  autoregressive  moving  average  (ARMA)  model  [2],  or  C) 
produces  a  pure  tone  [1].  These  assumptions  are  only  partially 
satisfied  by  vehicles;  hence  the  estimation  algorithms  based  on 
these  assumptions  do  not  perform  as  expected  when  they  are  ap¬ 
plied  to  field  data. 

When  an  array  of  passive  acoustic  sensors  is  used,  existing 
approaches  in  the  literature  concentrate  on  the  correlation  among 
the  multiple  microphone  signals.  Forren  and  Jaarsma  [4]  aim 
to  classify  vehicles  based  on  their  axle  detections  by  exploiting 
the  tire  noise  generated  by  vehicles.  They  use  signal  correlations 
among  three  known  microphones  under  assumption  B.  However, 
they  do  not  model  any  interference  effects  of  the  tires  as  discussed 
in  this  paper.  Valcarce  el  al.  [3]  exploits  the  differential  time  de¬ 
lays  to  estimate  the  speed  by  assuming  A.  and  B.  They  use  ad¬ 
ditive  Gaussian  noise  models  and  obtain  biased  speed  estimates 
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as  in  [2].  Lo  and  Ferguson  [5]  develop  a  nonlinear  least  squares 
method  for  speed  estimation  using  a  quasi-Newton  method  for 
computational  efficiency.  The  estimated  speed  is  based  on  time- 
delay-of-arrival  estimates  under  assumptions  A  and  B.  Similar 
to  [2,3],  a  negative  bias  in  the  estimates  is  also  noted  in  their  field 
tests,  which  also  involve  helicopters  as  targets  [5]. 

In  this  paper,  we  provide  a  power  based  algorithm  for  ve¬ 
hicle  speed  estimation  using  a  single  microphone.  We  describe 
the  spectral  and  spatial  content  of  vehicle  signals  and  recast  the 
speed  estimation  problem  as  a  spatial  acoustic  pattern  recogni¬ 
tion  problem.  We  calculate  the  received  signal’s  power  envelope 
and  approximate  it  using  three  envelope  shape  (ES)  components. 
The  ES  components  spatially  decompose  the  total  vehicle  noise 
into  parts  that  also  account  for  tire  interference  effects,  tire  horn 
effects,  and  air  turbulence  effects,  which  are  not  considered  in 
the  current  literature.  For  estimation,  we  introduce  a  vehicle  pro¬ 
file  vector  that  characterizes  the  ES  components  and  also  includes 
classifying  vehicle  information  such  as  the  engine  revolutions  per 
minute  (RPM)  and  the  number  of  cylinders.  The  vehicle  profile 
vector  can  be  thought  as  a  fingerprint  of  the  vehicle. 

Our  motivation  for  the  vehicle  profile  vector  is  also  the  acous¬ 
tic  correspondence  problem:  given  recorded  measurements  of 
two  vehicles  (calibration  recordings),  we  would  like  to  determine, 
with  high  confidence,  the  label  of  the  vehicle  when  it  drives  by 
another  control  microphone.  This  problem  has  applications  in 
distributed  sensor  networks  [7,8].  The  problem  becomes  compli¬ 
cated  when  1)  the  control  microphone  has  a  different  distance  to 
the  closest  point  of  approach  (CPA)  of  the  vehicle,  2)  the  vehicle 
is  moving  with  a  different  speed  or  moving  on  a  different  medium 
(e.g.,  gravel  as  opposed  to  asphalt),  3)  whether  or  not  it  is  rain¬ 
ing  (has  rained),  4)  the  vehicle  is  or  was  significantly  loaded.  In 
this  paper,  we  comment  on  how  we  can  tackle  the  correspondence 
problem  using  the  vehicle  profile  vector. 

2.  VEHICLE  SIGNAL’S  SPECTRAL  AND  SPATIAL 
CONTENT 

A  vehicle’s  acoustic  signal  consists  of  a  combination  of  various 
noise  signals  generated  by  the  engine,  the  tires,  the  exhaust  sys¬ 
tem,  aerodynamic  effects,  and  mechanical  effects  (e.g.,  axle  ro¬ 
tation,  break  pads,  and  suspension).  Hence,  the  spectral  content 


Report  Documentation  Page 

Form  Approved 

OMB  No.  0704-0188 

Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 

VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 

1 .  REPORT  DATE  2.  REPORT  TYPE 

01  NOV  2006  N/A 

3.  DATES  COVERED 

4.  TITLE  AND  SUBTITLE 

Vehicle  Fingerprinting  Using  Drive-By  Sounds 

5a.  CONTRACT  NUMBER 

5b.  GRANT  NUMBER 

5c.  PROGRAM  ELEMENT  NUMBER 

6.  AUTHOR(S) 

5d.  PROJECT  NUMBER 

5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Center  for  Automation  Research,  University  of  Maryland,  College  Park, 
MD  20742 

8.  PERFORMING  ORGANIZATION 

REPORT  NUMBER 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

10.  SPONSOR/MONITOR'S  ACRONYM(S) 

11.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 

12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release,  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

See  also  ADM002075.,  The  original  document  contains  color  images. 

14.  ABSTRACT 

15.  SUBJECT  TERMS 

16.  SECURITY  CLASSIFICATION  OF:  17.  LIMITATION  OF 

18.  NUMBER  19a.  NAME  OF 

a.  REPORT  b.  ABSTRACT  c.  THIS  PAGE  |J|J 

unclassified  unclassified  unclassified 

8 

Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std  Z39-18 


of  a  vehicle’s  signal  includes  wideband  processes  as  well  as  har¬ 
monic  components.  It  also  has  a  spatial  distribution  because  the 
noise  sources  are  at  different  locations  on  the  vehicle.  The  mix¬ 
ture  weighting  of  these  spectral  components  at  any  given  location 
is  dependent  on  the  vehicle’s  speed,  whether  the  vehicle  is  accel¬ 
erating,  decelerating,  turning,  and  whether  the  vehicle  is  in  good 
mechanical  condition.  In  general,  one  can  approximate  a  vehi¬ 
cle’s  signal  as  consisting  of  four  noise  components: 

Engine  Noise:  The  noise  from  an  internal  combustion  engine 
contains  a  deterministic  harmonic  train  and  a  stochastic  compo¬ 
nent  similar  to  the  human  speech  [9, 10].  The  stochastic  compo¬ 
nent  of  the  engine  noise  is  largely  due  to  the  turbulent  air  flow  in 
the  air  intake  (or  intercooler),  the  engine  cooling  systems,  and  the 
alternator  fans.  This  stochastic  component  is  wideband  in  nature. 
The  deterministic  component  is  caused  by  the  fuel  combustion  in 
the  engine  cylinders  and  has  more  power  than  the  stochastic  com¬ 
ponent.  The  lowest  deterministic  tone  is  called  the  cylinder  fire 
rate  /o,  defined  as  the  firing  rate  of  any  one  cylinder  in  the  engine. 
Since  each  cylinder  fires  once  every  two  engine  revolutions  in  a 
four-stroke  engine,  there  is  a  simple  relationship  between  f0  and 
the  RPM  x  of  a  vehicle: 


The  strongest  tone  in  the  engine  noise  is  called  the  engine  fire 
rate  Fq,  and  it  is  related  to  /o  in  a  simple  manner: 

F0  =  fa  x  p,  (2) 

where  p  is  the  number  of  cylinders  in  the  engine.  One  can  think 
of  F0  and  its  integer  multiples  as  the  formant  frequencies  in  hu¬ 
man  speech.  The  expressions  for  /o  and  Fq  model  the  reality 
quite  well;  however,  small  deviations  do  occur.  For  example,  in 
modern  cars,  each  cylinder  is  individually  controlled  by  an  engine 
management  system,  which  might  fluctuate  /o  and  Fq  to  optimize 
fuel  consumption  or  torque.  Hence,  in  some  cases,  the  locations 
of  fo-  and  Fo -harmonics  might  provide  a  fingerprint  for  the  spe¬ 
cific  engine  [9]. 

Car  manufacturers  try  to  suppress  the  engine  noise  as  much 
as  possible  for  the  passengers’  comfort  inside  the  vehicle  cabin  in 
frequency  ranges  that  the  human  ears  are  most  sensitive  to  (1kHz 
to  4kHz)  [10].  In  addition,  the  manufacturers  try  to  suppress  the 
noise  levels  outside  the  car  as  mandated  by  the  federal  standards 
for  highway  noise  (e.g.,  in  the  US,  see  [11, 12]).  They  design  qui¬ 
eter  engines  and  also  exploit  the  body  of  the  vehicle  to  filter  the 
engine  noise.  To  achieve  this,  the  interior  of  the  engine  compart¬ 
ment  is  usually  treated  with  material  for  acoustical  attenuation 
(the  metallic  shell  also  acts  as  a  filter).  Hence,  in  some  cases,  the 
engine  noise  might  be  stronger  on  the  side  and  at  the  very  front  of 
the  car  than  other  directions,  because  sound  propagation  through 
the  axle,  the  front  grill,  and  the  bottom  of  the  engine  block  cannot 
be  filtered  effectively. 

Tire  Noise  The  term  tire  noise  is  defined  as  the  noise  emitted 
from  a  rolling  tire  as  a  result  of  its  interaction  with  the  road  sur¬ 
face.  The  tire  noise  is  the  main  source  of  a  vehicle’s  total  noise 


after  50km/h  [13].  It  consists  of  two  components:  vibrational 
noise  and  air  noise  [14, 15].  The  vibrational  component  is  caused 
by  the  contact  between  the  tire  threads  and  the  pavement  texture. 
Its  spectrum  is  most  dominant  between  100  —  1000Hz  frequency 
range.  The  air  noise  is  generated  by  the  air  being  sucked-in  or 
forced  out  of  the  rubber  blocks  of  a  tire  and  is  dominant  in  the  fre¬ 
quency  ranges  between  1000  and  3000Hz.  The  actual  frequency 
calculations  are  complicated  by  the  thread  geometry  [16]. 

In  the  driving  direction  of  the  car,  the  road  and  the  tire  forms 
a  geometrical  structure  that  amplifies  the  noise  generated  by  the 
tire-road  interaction  [15,  17,  18],  This  effect  is  called  the  horn 
effect  and  has  a  directional  pattern  [17].  This  amplification  re¬ 
sults  in  a  strong  vehicle  tire  noise  component  at  far  distances  in 
the  frequency  range  600  —  2000Hz  (  [15]:  Chapter  7.1.25).  The 
directivity  of  the  horn  effect  depends  on  the  tire  width  and  radius, 
the  tire  shoulders,  the  tire  thread  geometry  as  well  as  the  weight 
and  torque  on  the  tire.  Analytical  calculations  based  on  these  fac¬ 
tors  are  rather  difficult,  and  hence,  numerical  approaches  such  as 
boundary  element  methods  are  used  to  simulate  the  horn  effect 
for  a  given  tire  configuration  [17, 18].  Notably,  most  of  the  total 
tire  noise  power  including  the  horn  effect  lies  between  the  fre¬ 
quencies  of  700  —  1300Hz  with  a  multi  coincidence  peak  around 
1000Hz  [15]. 

Exhaust  Noise  The  exhaust  system  consists  of  the  exhaust 
manifold,  catalytic  converter,  resonator,  exhaust  pipe,  muffler, 
and  the  tail  pipe.  The  system  goes  from  the  engine  compartment 
to  the  back  of  the  car  generating  the  exhaust  noise.  Due  to  the 
system’s  spatial  distribution,  this  noise  is  less  prominent  in  the 
front  of  a  vehicle.  Unlike  the  engine  block  noise,  the  exhaust 
system  noise  increases  significantly  with  the  engine  load.  The 
exhaust  noise  is  also  affected  by  engine  turbo/super  chargers  and 
after-coolers  [19,20]. 

Manufacturers  use  a  combination  of  reactive  and  absorptive 
silencers  to  keep  the  exhaust  noise  level  down.  The  exhaust  noise 
has  broadband  characteristics  with  most  of  its  power  concentrated 
around  the  low  frequencies.  It  has  the  same  harmonics  frequency 
structure  as  the  engine  and  additional  tail  pipe  resonances  that 
occur  at  fundamental  frequency  of  fe  =  c/ (21),  where  l  is  the  tail 
pipe  length  and  c  is  the  speed  of  sound  [19,20]. 

Air  Turbulence  Noise  Vehicle  induced  turbulence  can  be¬ 
come  an  important  factor  in  the  overall  perceived  loudness  of  a 
vehicle  as  the  vehicle  speed  increases.  This  noise  is  due  to  air 
flow  generated  by  the  boundary  layer  of  the  vehicle  and  is  promi¬ 
nent  immediately  after  the  vehicle  passes  by  the  sensor  (by  a  dis¬ 
tinctive  whoosh  sound).  The  turbulence  noise  depends  on  the 
aerodynamics  of  the  vehicle  as  well  as  the  ambient  wind  speed 
and  its  orientation  [21,22].  In  our  problem,  we  only  consider  the 
case  when  the  wind  speed  is  much  less  than  the  vehicle  speed. 
For  this  case,  perturbation  analysis  methods  can  yield  analytical 
expressions  for  the  mean  and  the  variance  of  the  turbulent  veloc¬ 
ity  components  [23].  These  expressions  may  be  used  to  further 
improve  our  results  in  this  paper. 


Fig.  1.  Dipole  geometry.  When  the  dipole  sources  are  correlated, 
the  resulting  wave  propagation  effect  on  the  received  signal  power 
is  not  a  superposition  of  individual  monopole  effects. 


3.  INTERFERENCE  PHENOMENA 


Let  s(t)  be  a  zero-mean  i.i.d.  acoustic  signal  emitted  by  the 
monopole  source.  To  simplify  the  results,  we  concentrate  on  the 
following  special  case,  where  the  Fourier  transform  of  the  source 
signal  is  assumed  to  be  bandlimited  as  follows: 


\sm  = 


s',  Hi  <  n  <  n2,  w  =  n2  — 

0,  otherwise. 


(3) 


When  the  source  signal  has  a  spatial  extent,  it  is  crucial  to 
consider  interference  effects  while  estimating  the  speed.  To  demon¬ 
strate  the  interference  effects,  consider  a  dipole  source  moving 
along  the  x-axis  as  illustrated  in  Fig.  1.  In  this  case,  the  received 
signal  is  the  sum  of  the  two  source  signals  that  are  assumed  to  be 
coherent: 

^  (i 3i[n\nFs  -  ,  (4) 


where  /?*(£)  (i  =  1 ,  2)  is  the  Doppler  shift  factor  of  each  monopole 
source  in  the  dipole.  Under  the  far  field  assumptions  [24],  one 
can  approximate  ss  r  and  /%  ss  /3  as  defined  in  the  monopole 
source  case  (Fig.  1). 

In  the  far-held,  with  the  same  assumptions  for  the  monopole 
source,  the  Fourier  transform  Z(fl)  of  the  signal  z(t)  can  be  writ¬ 
ten  as 


z(n) 


S  Gm) 

/3[n\r[n\ 


-j 


P[n]‘ 


(r2[n]-ri[n]) 


(5) 

Hence,  the  received  signal  bandwidth  is  modulated  as  in  the  monopole 
case.  However,  note  that  the  additional  term  in  the  brackets  in  (5) 
plays  a  crucial  role  when  we  look  at  the  average  received  signal 
power: 


Pz[n] 


S2FsW  1  ,  , 

O  1  2\  1 

2  7 rr  /j[n\rz[n\ 


(6) 


Fig.  2.  Interference  patterns  and  power  functions  along  x  and 
y  directions  for  a  dipole  source  with  the  following  parameters: 
v  =  20m/s,  W  =  1.5m,  =  600Hz,  and  fl2  =  2000Hz. 

In  (b)  and  (d),  the  power  function  is  plotted  with  (dashed  line) 
and  without  (solid  line)  the  interference  term.  Note  the  dramatic 
effect  of  the  interference  term  on  the  power  function  in  (d)  along 
the  y  direction. 


where  p[n\  is  called  the  interference  term.  When  the  dipole  source 
signal  is  baseband,  the  interference  term  has  the  following  sim¬ 
pler  form: 


p[n\ 


1  +  sine 


(7) 


The  interference  term  has  a  special  hyperbolic  pattern  (Fig.  2).  In 
the  far  field  of  the  dipole,  the  interference  term  is  constant  along 
the  asymptotes  of  the  hyperbolas  defined  as  r2  —  r\  =  2 a  for 
a  —  sin  0.  Moreover,  it  is  well-known  that  the  local  extremes 
of  the  sine  function  correspond  to  its  intersections  with  the  cosine 
function.  Hence,  a  minima  and  a  maxima  of  the  sine  function  are 
on  the  average  half  the  cosine  period  away  from  each  other. 


4.  JOINT  ESTIMATION  OF  SPEED  AND  SPATIAL 
ACOUSTIC  PATTERNS 

4.1.  Envelope  Shape  Components 

To  determine  a  vehicle’s  speed  using  acoustic  observations  from  a 
single  microphone,  we  jointly  estimate  the  vehicle’s  spatial  acous¬ 
tic  pattern.  In  the  previous  section,  we  introduced  an  interfer¬ 
ence  effect  that  creates  a  part  of  the  total  spatial  acoustic  pattern. 
We  denote  any  such  component  that  makes  up  a  vehicle’s  spatial 
acoustic  pattern  as  an  envelope  shape  (ES)  component.  Note  that, 


earlier,  we  derived  the  interference  effect  on  the  observed  acous¬ 
tic  power  of  the  microphone  signal  with  respect  to  the  source  po¬ 
sition.  However,  in  this  section,  we  use  the  reciprocity  theorem 
and  change  the  reference  frame  from  the  moving  vehicle  to  the 
microphone  to  derive  the  ES  components  [25].  For  simplicity,  we 
model  the  ES  components  using  three  piecewise  constant  func¬ 
tions  in  dB  scale  with  respect  to  the  microphone  bearing  p  as 
illustrated  in  Fig.  3.  We  make  the  connection  between  the  ES 
components  and  the  received  signal  power  in  the  next  subsection. 

The  first  ES  component  p-y(p)  in  Fig.  3  models  the  signal  in¬ 
terference  due  to  the  front  and  rear  tires,  which  can  be  modeled  as 
dipole  sources.  This  component  explains  the  perturbation  in  the 
envelope  function  of  the  vehicle  acoustic  drive-by  signals  around 
the  microphone  bearing  of  p  =  17°  (Fig.  2).  The  interference 
effects  before  this  angle  are  ignored  because  they  are  dominated 
by  the  microphone  noise.  In  this  ES  component,  the  tire  inter¬ 
ference  decreases  between  the  bearings  71  and  72,  increasing  the 
first  component  to  <57ji.  The  angles  7,;  are  related  to  the  width  of 
the  car  (dipole  separation  W)  through  the  interference  term. 

After  the  drive-by,  the  tire  interference  increases  between  the 
bearings  72  and  7] ,  decreasing  the  first  ES  component  to 
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Fig.  3.  The  microphone  bearing  reference  orientation  is  defined 
as  the  moving  direction  of  the  vehicle.  Then,  a  vehicle’s  spatial 
acoustic  pattern  can  be  approximated  by  three  main  components 
in  p.  The  first  component  p1  is  due  to  the  signal  interference 
from  the  front  and  rear  tires.  The  second  component  pg  explains 
the  variation  as  the  microphone  comes  out  of  the  horn-effect  area 
of  the  tires.  The  third  component  p^  is  an  approximate  compo¬ 
nent  that  accounts  for  a  composite  engine/exhaust/tire/turbulence 
noise  effect  around  the  vehicle  CPA. 


The  parameter  d7j 2  is  usually  close  to  zero.  We  note  that  the 
component  p~/(p)  varies  in  a  nonsymmetric  fashion  with  respect 
to  ip.  The  asymmetry  is  due  to  the  movement  of  the  car:  because 
of  the  reference  frame  change,  any  angle  defined  in  the  vehicle 
reference  frame,  denoted  by  cf>,  is  related  to  the  angles  in  the  mi¬ 
crophone  frame,  denoted  ip,  through  an  aberration  relation  [26]: 


p 

tan-  = 


'  1  +  vie  cj> 

i - rtanT> 

1  —  v/c  2 


(8) 


where  the  sign  of  the  speed  terms  flip  after  the  CPA.  Hence,  by 
assuming  a  symmetric  interference  pattern  for  the  front  and  rear 
tires  of  the  car  based  on  constant  car  width,  one  can  relate  the 
following  angle  parameters: 


tan  —  =  7r  — 
2 


( 1  - 

\1  +  v/c) 


tan 


(9) 


where  k  =  71  and  72 . 

The  second  ES  component  pg(p)  is  due  to  the  horn  effect, 
which  was  explained  in  Sect.  2.  In  the  observed  signal  envelope, 
at  the  microphone  bearing  0  \  the  horn  effect  amplification  of  the 
farthest  front  tire  from  the  microphone  starts  to  go  down  until  the 
bearing  62  to  Sg  2,  when  the  horn  effect  of  the  closest  rear  tire  to 
the  microphone  also  drops.  The  differential  angle  62—81  is  a  very 
good  indicator  of  the  vehicle  length,  which  can  be  used  to  com¬ 
pare  the  relative  sizes  of  vehicles.  To  convert  the  angle  difference 
into  actual  size,  we  use  the  following  approximate  relationship 


Urn  _  Vm 

cos  91  cos  9 2  1 


(10) 


where  L  is  the  car  length.  Geometrically,  02  is  the  microphone 
bearing  of  the  front  of  the  car  after  the  car  fully  passes  the  line 
defined  by  the  bearing  9\. 

After  the  CPA,  the  horn  effect  amplifies  the  tire  noise  between 
bearings  9'2  and  9[,  which  are  related  to  9\  and  82  also  by  the 
abberration  relation  (9)  (i.e.,  k  =  (),).  The  final  level  of  the  tire 
noise  component  Sgp  is  usually  different  than  OdB,  because  the 
rear  tire  curvature  is  different  from  the  front  tire  curvature  due  to 
the  torque  on  the  tire.  Any  imbalance  of  the  weight  ratio  on  the 
front  and  rear  tires  also  causes  Sgp  to  deviate  from  OdB. 

Finally,  the  third  ES  component  p ^  (p)  is  a  composite  compo¬ 
nent  that  incorporates  (i)  engine  noise,  (ii)  exhaust  system  noise, 

(iii)  interference  pattern  of  the  tires  on  the  side  of  the  car,  and 

(iv)  the  noise  caused  by  the  air  turbulence.  To  keep  the  number 
of  ES  components  manageable,  we  approximate  the  composite 
interference  pattern  as  a  step  function  that  rises  from  OdB  to  5^ 
between  bearings  ipi  and  tp2-  When  this  approximation  becomes 
poor,  Sgp  of  the  second  ES  component  pg(p)  compensates.  We 
found  that  the  angle  difference  i[>2  —  ipi  is  also  an  indicator  of  the 
vehicle  length.  Hence,  (10)  is  also  used  to  relate  the  angles  in  the 
third  interference  component  to  the  car  length  L. 


4.2.  Vehicle  Profile  Vector 

To  jointly  determine  the  speed  and  the  vehicle’s  spatial  acoustic 
pattern,  we  use  the  vehicle  profile  vector  A,  which  is  defined  as 
follows: 

A  =  [  \v  \  A s  A f  ]  ,  where  (11) 

A„  =  [  S  v  W  L  ],  Xlfi=  [  (p0  7!  9i  ipi  ], 

-Vs  =  [  <57,i  <57,2  $b,i  $ 8,2  $*!>  ]  ,  and  A/  =  [  X  P  ]  ■ 

(12) 

The  vector  A,,  consists  of  the  physical  parameters  of  the  vehicle 
such  as  the  loudness  S,  speed  v,  car  width  W,  and  the  car  length 
L.  The  vector  \.f  has  the  initial  vehicle  bearing  ipo  and  the  an¬ 
gles  that  define  the  ES  components  along  with  Aj,  which  contains 
the  amplitude  attenuations  and  amplifications  for  the  ES  compo¬ 
nents.  Lastly,  the  vector  A /  has  the  RPM  x  and  the  number  of 
cylinders  p  of  the  vehicle.  The  profile  vector  A  can  be  viewed  as 
a.  fingerprint  of  the  vehicle  and  can  be  used  for  appearance-based 
tracking  and  classification. 


(a) 


(b) 


Fig.  4.  Drive-by  test  by  a  6-cylinder  Chevy  Impala  moving  with 
18.7m/s  at  an  approximate  distance  of  5.8m.  (a).  The  acoustic 
signal  was  sampled  at  Fs  =  48kHz.  The  power  envelope  £  is  cal¬ 
culated  with  t  =  480.  In  the  figure,  \/r£  is  plotted  to  emphasize 
the  variation.  There  is  an  asymmetry  in  the  envelope  estimates 
that  can  be  explained  by  the  ES  components,  (b).  The  spec¬ 
tral  content  of  the  acoustic  signal.  Note  the  strong  interference 
at  60Hz.  The  tire-noise  spectrum,  which  is  concentrated  around 
700  —  1300Hz,  does  not  exhibit  a  frequency  modulation  pattern 
as  predicted  by  the  theory. 


4.3.  Amplitude  Observations 

In  this  section,  we  derive  a  relationship  between  the  vehicle  pro¬ 
file  vector  A  and  the  square-root  of  the  average  signal  power, 
which  we  will  denote  as  the  power  envelope.  This  relationship  is 
used  to  determine  the  vehicle  profile  vector  using  standard  maxi¬ 
mum  likelihood  estimation  techniques. 

We  define  the  power  envelope  function  by  using  r-discrete 
samples  of  z[n]  as  follows: 


£[nT\  =  £[t) 


Vpz[nT\, 


(13) 


where  subscript  r  under  the  sample  index  n  implies  that  the  sam¬ 
ples  of  the  continuous  function  are  calculated  at  every  t/Fs  sec¬ 
ond.  The  parameter  r  is  chosen  so  that  the  DFT  coefficients 
used  to  calculate  the  power  function  at  r-samples  apart  are  sta¬ 
tistically  uncorrelated,  and  hence,  each  sample  of  £[nT]  ( nT  = 
0, 1, . . . ,  Nt  —  1)  is  also  statistically  uncorrelated  of  the  others. 

Assuming  that  the  noise  acting  on  the  microphone  signal  z[n } 
is  zero-mean  additive  white  Gaussian  noise  with  variance  cr^,  we 
relate  the  envelope  observations  to  the  vehicle  profile  vector  as 
follows: 


£2[nT]  ^  Ax[nT}e2m^  +  ^wT, 


\nT]  = 


C 


P[nT}r2  [nT] 


IQ  Pi(Anr 


D/to 


z=7,  Qi'ijj 


_  S2FsW 

2ttt 

(14) 


r  degrees  of  freedom)  that  is  also  independent  of  mT,  and 

V 

P[nT\  =  1 - cos  (p[nT\, 

c 

r[nT]  =  \J ( vt/Fs )2  +r2[nT  -  1]  -  2  ( VT/Fs)r[nT  -  1]  cos  ip[nT  -  1], 

tp[nT\  =  (f[nT  -  1]  +  sin-1  (  VT  sin</j[nT  -  1] )  , 

\F  sr[nT  \  j 

r[0]  =  ymsecip0. 

(15) 

In  (15),  we  implicitly  assume  that  the  constant  velocity  motion 
equations  are  not  violated  even  though  the  reference  frame  is 
changed  from  the  moving  vehicle  to  the  stationary  microphone. 

Let  £  =  [  £[0]  ...  £ [nT]  . . .  £[NT  —  1]  ]  denote  the 

aggregate  envelope  observations.  Then,  the  observation  likeli¬ 
hood,  given  the  vehicle  profile  vector  as  well  as  the  noise  vari¬ 
ances  cr^  and  a2,  can  be  determined  by  a  straightforward  bivari¬ 
ate  transformation  followed  by  marginalization  [27].  Unfortu¬ 
nately,  the  marginalized  data  likelihood  does  not  have  a  closed 
form  solution  and  needs  to  be  evaluated  numerically.  Moreover, 
note  that  the  noise  variances  af  and  a2  have  to  be  determined  for 
the  evaluation  of  the  likelihood.  A  joint  estimation  of  the  vehicle 
profile  vector  and  the  noise  variances  can  be  done.  However,  this 
increases  the  numerical  complexity.  In  theory,  these  noise  vari¬ 
ances  can  be  treated  as  nuisance  parameters  and  can  be  integrated 
out  using  reference  priors  [28].  In  practice,  they  further  increase 
the  necessity  of  numerical  integration. 


where  Ax[nT]  is  the  directional  power  variation,  emT  is  an  i.i.d. 
multiplicative  noise  on  the  signal  amplitude  (to  ~  Af  (0,  cr^)), 
wT  is  an  i.i.d.  additive  Chi2T  noise  (chi-squared  distribution  with 


4.4.  Frequency  Observations 

The  spectral  content  of  a  vehicle  exhibits  directional  variation, 
making  it  difficult  to  use  the  frequency  modulation  effects  of  the 


vehicle  motion  to  determine  speed.  We  emphasize  that  this  di¬ 
rectional  variation  is  not  due  to  the  motion  of  the  vehicle  but  it 
is  due  to  tire  noise  effects,  which  are  stochastic  in  nature  as  dis¬ 
cussed  in  Sect.  2.  The  useable  frequency  tracks  for  speed  esti¬ 
mation  are  generated  by  the  engine  because  the  frequency  modu¬ 
lation  effects  can  be  observed  in  the  deterministic  component  of 
the  engine  noise.  These  deterministic  engine  frequencies  span  the 
0  —  250Hz  range  at  nominal  RPM’s  (e.g..  Fig.  4(b)).  At  moderate 
vehicle  speeds  (30  —  50mph),  the  full  Doppler  shift  swings  Fq 
approximately  %6,  also  corresponding  to  an  RPM  change  of  the 
same  amount  (A\  «  200).  Hence,  if  a  driver  changes  the  car’s 
RPM  by  50  during  the  vehicle  drive-by,  there  will  be  a  %25  error 
in  Fq  when  one  assumes  a  constant  frequency  source.  We  empha¬ 
size  that  this  RPM  change  is  unnoticeable  on  the  dashboard  of  the 
vehicle  and  is  likely  to  happen.  On  the  other  hand,  the  effect  of 
the  same  RPM  change  on  the  total  car  loudness  is  negligible. 

Therefore,  determining  a  probability  density  function  for  the 
vehicle  profile  vector  by  fitting  a  Doppler  shift  function  to  the 
engine  and  tire  frequency  tracks  is  an  unreliable  approach.  For 
example,  in  [2],  the  speed  estimation  was  performed  using  an  au¬ 
toregressive  modeling  of  the  acoustic  signals  under  a  point  source 
assumption.  It  was  concluded  that  the  Doppler-based  speed  es¬ 
timation  on  the  source  frequencies  does  not  perform  well  with 
field-data  [2].  It  was  also  concluded  that,  with  the  same  source 
assumptions,  the  envelope  measurements  yield  improved  speed 
estimates  than  the  frequency  measurements;  however  the  speed 
estimates  are  nonetheless  biased.  A  possible  reason  of  this  bias 
mentioned  in  [2]  and  also  observed  in  [3-5]  is  the  additive  Gaus¬ 
sian  model  as  opposed  to  the  multiplicative  noise  model  that  we 
employ  in  this  paper. 

On  the  other  hand,  the  spectral  harmonic  content  can  be  used 
to  determine  A^  of  the  vehicle  profile  vector.  Moreover,  condi¬ 
tioned  on  \v  estimate,  it  possible  to  further  refine  A/  by  com¬ 
pensating  for  the  Doppler  shifts.  The  number  of  cylinders  p  is 
usually  the  most  elusive  to  estimate  because  the  body  of  a  vehicle 
may  also  act  as  a  filter  to  directionally  suppress  the  frequency  at 
the  engine  fire  rate  F0.  Hence,  it  is  rather  easy  to  incorrectly  es¬ 
timate  the  number  of  cylinders  of  a  vehicle  because  the  strongest 
frequency  is  not  necessarily  Fq.  If  a  characterization  can  be  done, 
which  is  applicable  to  the  vehicles  of  interest,  the  number  of 
cylinders  can  also  be  estimated  robustly.  Estimation  of  x  can  be 
done  accurately  using  harmonic  analysis  methods  [29].  In  our  es¬ 
timation,  we  use  the  power  spectral  density  of  the  acoustic  signal 
to  determine  Ay.  Details  can  be  found  in  [29]. 


5.1.  Vehicle  Profiling 

Table  1  lists  the  results  of  the  vehicle  speed  estimates  obtained  by 
three  different  methods  using  r  =  480  samples:  1)  the  full  vehi¬ 
cle  profile  vector  using  (14),  2)  only  A,  using  (14),  and  3)  only 
\v  using  the  constant  velocity  motion  model  on  a  point  source 
as  in  [1,2].  It  is  seen  that  the  estimates  of  [2]  are  improved  by 
incorporating  the  multiplicative  noise  model  on  the  signal  enve¬ 
lope.  Estimation  using  the  ES  components  yields  the  best  esti¬ 
mates  (Figs.  5  and  6).  We  also  estimated  the  number  of  cylinders 
and  the  RPM  of  the  vehicles  by  using  the  methods  in  [29].  The 
number  of  cylinder  estimates  p  are  estimated  by  compensating 
for  the  microphone  spectral  characteristics.  However,  there  was 
no  compensation  for  any  vehicle  directional  variation. 


Fig.  5.  Ford  F150.  (a)  Estimated  envelope  by  the  interference 
components  is  shown  with  the  solid  line.  The  dashed  line  and  the 
dotted  line  belong  to  the  additive  and  multiplicative  noise  model 
results,  respectively,  (b)  Estimated  ES  components  are  shown. 
According  to  the  ES  components,  the  vehicle  is  louder  in  the  rear 
than  front,  (c)  Estimated  joint  distribution  of  the  vehicle  dimen¬ 
sions  around  the  solution. 


(a)  (b) 


Fig.  6.  Chevy  Impala.  (a)  Observed  envelope  exhibits  significant 
variations.  The  interference  components  (solid  line)  adequately 
explains  the  variations,  (b)  Estimated  ES  components  are  shown. 
Parameter  5^  is  relatively  larger  than  the  other  components  indi¬ 
cating  significant  air  noise,  (c)  Estimated  joint  distribution  of  the 
vehicle  dimensions  around  the  solution. 


5.  EXPERIMENTS 

To  demonstrate  the  ideas,  data  was  collected  with  Fs  =  48KHz  at 
a  two-way  street  with  an  omnidirectional  microphone,  emplaced 
1.5m  off  the  ground  on  a  pole  at  the  sidewalk.  The  distance  of  the 
bottom  of  the  microphone  pole  to  the  center  of  the  street  is  7.4m. 
A  video  camera  is  used  to  establish  the  ground  truth  and  identify 
the  vehicles  in  the  test. 


5.2.  Vehicle  Classification  Results 

The  vehicle  profile  vector  provides  a  natural  basis  for  classifying 
vehicles.  Figures  7(a)  and  (b)  show  that  the  vehicles  can  be  sepa¬ 
rated  into  two  classes  based  on  their  length  and  size.  Note  that  the 
estimated  vehicle  lengths  are  not  exact  vehicle  lengths;  however, 
they  can  separate  compact  cars  from  SUV’s  or  trucks.  Figure  7(c) 


Table  1.  Field  Test  Results 


Ground  Truth 

Estimation  using  A 

Estimation  using  A^ 

Estimation  using  a£ 

Vehicle 

Vm 

^camera 

V 

C 

L 

W  + 

X 

V 

C 

C 

Ford  FI  50 

Chevy  Impala 
Honda  Accord 
Nissan  Maxima* 
Nissan  Maxima* 
Isuzu  Rodeo 
Mercedes  E 

Volvo  850  SW 
Nissan  Frontier 

VW  Passat 

6.3 
5.8 

4.3 

4.6 

4.1 

8.1 

8.1 

8.1 

4.3 

5.1 

17.54m/s 

18.68m/s 

1 6.74m/s 
13.32m/s 
4.14m/s 

1 3.44m/s 
13.94m/s 
14.11  m/s 

1 7.56m/s 

1 1 ,66m/s 

1 7.86m/s 
18.60m/s 

1 4.44m/s 
13.20m/s 
4.49m/s 
13.89m/s 
13.80m/s 
14.69m/s 
17.84m/s 

1 1 ,58m/s 

12.60 

9.23 

6.86 

12.45 

6.34 

7.87 

7.68 

9.60 

9.31 

6.06 

5.38m 

2.58m 

3.28m 

3.28m 

2.58m 

5.20m 

2.93m 

3.10m 

4.85m 

2.75m 

1.30m 

1.75m 

1.40m 

1.50m 

1.50m 

1.35m 

1.50m 

1.40m 

1.40m 

1.80m 

3038 

3300 

3074 
3825 
3150 
3450 

3075 
2250 
2625 
1950 

8 

6 

6 

6' 

4 

6 

6 

10' 

6 

6 

28.00m/s 

18.29m/s 

1 7.34m/s 
14.23m/s 
4.75m/s 

1 1 ,32m/s 
15.51  m/s 
12.93m/s 
17.02m/s 
8.58m/s 

24.08 

11.55 

10.17 

14.86 

9.90 

7.97 

10.47 

9.01 

9.92 

6.06 

21 ,39m/s 

1 5.05m/s 

1 4.49m/s 
14.27m/s 
3.46m/s 

1 1 ,79m/s 

1 1 ,78m/s 

1 1 ,22m/s 

1 7.56m/s 
8.66m/s 

21.27 

10.90 

9.67 

14.49 

9.20 

7.95 

9.93 

8.63 

9.74 

6.11 

Error  STD 

0.8246m/s 

3.7203m/s 

2.2627m/s 

Error  STD1' 

0.2777m/s 

1.5154m/s 

1.5126m/s 

Bias 

-0.0735m/s 

0.6845m/s 

-1.1458m/s 

Biasl 

0.1737m/s 

-0.401 7m/s 

-1.7013m/s 

§  Using  the  method  in  [2]  with  multiplicative  noise  model  introduced  here. 

t  Using  the  method  in  [2]  without  any  change. 


A  fixed  bandwidth  of  W  =  600Hz  is  used  to  determine  the  car  widths. 
t| 

Estimated  by  finding  the  frequency  Fq  with  the  maximum  power  spectral  density  between  frequencies  85-210Hz  and  then  dividing  Fq  by  the  CFR  / q  estimate  [29]. 
Incorrectly  estimated.  The  actual  values  are  4  (Maxima)  and  5  (Volvo). 

Same  vehicle. 

^  Calculated  by  removing  one  outlier  in  each  method. 


also  illustrates  that  it  is  possible  to  identify  loud  vehicles  such  as 
vehicles  with  mechanical  problems  or  heavily  loaded  SUV’s  or 
pick-up  trucks,  which  are  expected  to  be  louder  than  usual.  This 
classification  is  based  on  the  fact  that  the  loudness  of  the  vehi¬ 
cle  has  a  certain  functional  distribution  as  indicated  in  [14,  15]. 
Hence,  given  two  similar  vehicles,  it  may  be  possible  to  identify 
if  one  of  them  is  heavily  loaded  or  has  mechanical  problems  even 
if  they  move  at  different  speeds. 


(a) 


(b) 


(c) 


Fig.  7.  (a)  Estimated  vehicle  lengths  are  compared.  There  is  a 
clear  separation  between  compact  cars  and  large  vehicles,  (b)  Es¬ 
timated  vehicle  sizes  are  compared,  (c)  Logarithm  of  the  vehicle 
signal  amplitudes  are  plotted  with  respect  to  their  speed.  There  is 
a  linear  trend  in  the  plot  as  also  indicated  by  [14, 15].  The  solid 
line  represents  a  least  squares  fit  to  the  data  without  Nissan  Max¬ 
ima.  The  dotted  lines  are  one  standard  deviation  away  from  the 
mean.  Nissan  Maxima  is  louder  than  the  other  cars  because  the 
vehicle  has  mechanical  problems. 


6.  CONCLUSIONS 

We  presented  a  method  to  determine  a  vehicle’s  speed  via  its 
acoustic  drive-by  sounds  recorded  at  a  microphone,  by  formulat¬ 
ing  the  problem  as  a  joint  speed  and  acoustic  pattern  estimation 
problem.  We  achieve  this  estimation  using  a  vector  that  profiles 
the  directional  variation  of  the  vehicle  acoustic  pattern.  The  vehi¬ 
cle  profiles  vector  enables  a  signal  processor  to  better  address  the 


vehicle  correspondence  problem  since  the  vehicle  profile  vector 
provides  unbiased  speed  and  loudness  estimates  as  well  as  vehi¬ 
cle  dimensions.  It  also  generates  better  discriminative  features 
that  are  compressed  into  a  15 -dimensional  space.  Parameters  A,, 
and  Ay  of  the  vehicle  profile  vector  can  improve  the  confidence 
of  the  correspondence  matches,  whereas  their  compression  de¬ 
creases  communication  among  a  calibration  microphone  and  a 
control  microphone.  However,  as  usual,  given  the  difficulty  of 
the  correspondence  problem,  one  should  not  expect  superlative 
performance  for  all  cases  even  with  the  vehicle  profile  vector. 

While  determining  the  vehicle  speed,  we  relied  on  the  signal 
power  calculations  and  argued  that  the  signal  frequency  informa¬ 
tion  (Doppler)  was  not  useful.  On  the  other  hand,  when  an  array 
of  microphones  is  available,  one  can  also  infer  from  the  phase 
of  the  received  acoustic  data  across  the  array.  In  this  case,  we 
expect  that  the  performance  should  improve  more  than  that  one 
would  expect  to  obtain  from  multiple  independent  amplitude  ob¬ 
servations.  We  envision  that  when  multiple  vehicles  are  present, 
the  array  can  provide  the  acoustic  steering  necessary  to  remove 
the  cocktail  party  effect  on  the  ES  components.  Hence,  the  ap¬ 
proaches  in  the  literature  can  be  improved  to  obtain  unbiased 
speed  estimates  when  an  array  is  used. 
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